The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Robust Model-Agnostic Meta-Learning (MAML) is usually adopted to train a meta-model which may fast adapt to novel classes with only a few exemplars and meanwhile remain robust to adversarial attacks. The conventional solution for robust MAML is to introduce robustness-promoting regularization during meta-training stage. With such a regularization, previous robust MAML methods simply follow the typical MAML practice that the number of training shots should match with the number of test shots to achieve an optimal adaptation performance. However, although the robustness can be largely improved, previous methods sacrifice clean accuracy a lot. In this paper, we observe that introducing robustness-promoting regularization into MAML reduces the intrinsic dimension of clean sample features, which results in a lower capacity of clean representations. This may explain why the clean accuracy of previous robust MAML methods drops severely. Based on this observation, we propose a simple strategy, i.e., increasing the number of training shots, to mitigate the loss of intrinsic dimension caused by robustness-promoting regularization. Though simple, our method remarkably improves the clean accuracy of MAML without much loss of robustness, producing a robust yet accurate model. Extensive experiments demonstrate that our method outperforms prior arts in achieving a better trade-off between accuracy and robustness. Besides, we observe that our method is less sensitive to the number of fine-tuning steps during meta-training, which allows for a reduced number of fine-tuning steps to improve training efficiency.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-world urban autonomous multi-vehicle races were held in March and November of 2021, respectively. During the competition, multiple vehicles were involved simultaneously, which required maneuvers such as overtaking low-speed vehicles, negotiating intersections, and obeying traffic laws. In this study, we introduce a fully autonomous driving software stack to deploy a competitive driving model, which enabled us to win the urban autonomous multi-vehicle races. We evaluate module-based systems such as navigation, perception, and planning in real and virtual environments. Additionally, an analysis of traffic is performed after collecting multiple vehicle position data over communication to gain additional insight into a multi-agent autonomous driving scenario. Finally, we propose a method for analyzing traffic in order to compare the spatial distribution of multiple autonomous vehicles. We study the similarity distribution between each team's driving log data to determine the impact of competitive autonomous driving on the traffic environment.
translated by 谷歌翻译
现在,通过复杂的神经网络模型(例如蒙版的神经语言模型(MNLM))学习了许多上下文化的单词表示形式,这些模型由巨大的神经网络结构组成,并经过训练以恢复蒙面文本。这样的表示表明在某些阅读理解(RC)任务中表现出超人的表现,这些任务在给出问题的上下文中提取了适当的答案。但是,由于许多模型参数,确定在MNLM中训练的详细知识是具有挑战性的。本文提供了有关MNLMS中包含的常识性知识的新见解和经验分析。首先,我们使用诊断测试来评估常识性知识是否在MNLMS中进行了适当的培训。我们观察到,在MNLMS中没有适当训练很多常识性知识,并且MNLMS并不经常准确地理解关系的语义含义。此外,我们发现基于MNLM的RC模型仍然容易受到需要常识知识的语义变化的影响。最后,我们发现了未经训练的知识的基本原因。我们进一步建议,利用外常识性知识存储库可以是一个有效的解决方案。我们说明了通过在受控实验中以外常识性知识存储库来丰富文本的经文,以克服基于MNLM的RC模型的局限性的可能性。
translated by 谷歌翻译
精确地重建由单个图像的各种姿势和服装引起的精确复杂的人类几何形状非常具有挑战性。最近,基于像素对齐的隐式函数(PIFU)的作品已迈出了一步,并在基于图像的3D人数数字化上实现了最先进的保真度。但是,PIFU的培训在很大程度上取决于昂贵且有限的3D地面真相数据(即合成数据),从而阻碍了其对更多样化的现实世界图像的概括。在这项工作中,我们提出了一个名为selfpifu的端到端自我监督的网络,以利用丰富和多样化的野外图像,在对无约束的内部图像进行测试时,在很大程度上改善了重建。 SelfPifu的核心是深度引导的体积/表面感知的签名距离领域(SDF)学习,它可以自欺欺人地学习PIFU,而无需访问GT网格。整个框架由普通估计器,深度估计器和基于SDF的PIFU组成,并在训练过程中更好地利用了额外的深度GT。广泛的实验证明了我们自我监督框架的有效性以及使用深度作为输入的优越性。在合成数据上,与PIFUHD相比,我们的交叉点(IOU)达到93.5%,高18%。对于野外图像,我们对重建结果进行用户研究,与其他最先进的方法相比,我们的结果的选择率超过68%。
translated by 谷歌翻译
在本文中,我们提出了一种新颖的端到端用户定义的关键字发现方法,该方法利用语言和文本序列之间的语言相应模式。与需要语音关键字注册的先前方法不同,我们的方法将输入查询与注册文本关键字序列进行比较。为了将音频和文本表示形式放置在共同的潜在空间中,我们采用了一种基于注意力的跨模式匹配方法,该方法以端到端的方式进行了训练,并具有单调匹配的损失和关键字分类损失。我们还利用了声学嵌入网络的拖延损失来改善嘈杂环境中的鲁棒性。此外,我们介绍了Libriphrase数据集,这是一种基于LibrisPeech的新短语数据集,用于有效训练关键字斑点模型。与其他单模式和跨模式基线相比,我们提出的方法在各种评估集上取得了竞争性结果。
translated by 谷歌翻译
总生存时间(OS)时间是神经胶质瘤情况最重要的评估指数之一。多模式磁共振成像(MRI)扫描在神经胶质瘤预后OS时间的研究中起重要作用。为多模式MRI问题的OS时间预测提出了几种基于学习的方法。但是,这些方法通常在深度学习网络开始或结束时融合多模式信息,并且缺乏来自不同尺度的特征。此外,网络末尾的融合始终适应全球(例如,在全球平均池输出串联后完全连接)或与局部(例如,双线性池)的融合,这会失去与全球局部的局部信息。在本文中,我们提出了一种用于对脑肿瘤患者的多模式OS时间预测的新方法,该方法包含在不同尺度上引入的改进的非局部特征融合模块。我们的方法比当前最新方法获得了相对8.76%的改善(0.6989 vs. 0.6426的精度)。广泛的测试表明,我们的方法可以适应缺失方式的情况。该代码可在https://github.com/tangwen920812/mmmna-net上找到。
translated by 谷歌翻译
在临床实践中,由于较短的获取时间和较低的存储成本,通常使用了平面分辨率低的各向异性体积医学图像。然而,粗分辨率可能导致医生或计算机辅助诊断算法的医学诊断困难。基于深度学习的体积超分辨率(SR)方法是改善分辨率的可行方法,其核心是卷积神经网络(CNN)。尽管进展最近,但这些方法受到卷积运算符的固有属性的限制,卷积运算符忽略内容相关性,无法有效地对远程依赖性进行建模。此外,大多数现有方法都使用伪配合的体积进行训练和评估,其中伪低分辨率(LR)体积是通过简单的高分辨率(HR)对应物的简单降解而产生的。但是,伪和现实LR之间的域间隙导致这些方法在实践中的性能不佳。在本文中,我们构建了第一个公共实用数据集RPLHR-CT作为体积SR的基准,并通过重新实现四种基于CNN的最先进的方法来提供基线结果。考虑到CNN的固有缺点,我们还提出了基于注意力机制的变压器体积超分辨率网络(TVSRN),完全与卷积分配。这是首次将纯变压器用于CT体积SR的研究。实验结果表明,TVSRN在PSNR和SSIM上的所有基准都显着胜过。此外,TVSRN方法在图像质量,参数数量和运行时间之间取得了更好的权衡。数据和代码可在https://github.com/smilenaxx/rplhr-ct上找到。
translated by 谷歌翻译